class: center, middle, inverse, title-slide .title[ # Data visualization ] .author[ ###
James Ashmore
• 22-Oct-2022 ] .institute[ ### Zifo RnD Solutions ] --- exclude: true count: false <link href="https://fonts.googleapis.com/css?family=Roboto|Source+Sans+Pro:300,400,600|Ubuntu+Mono&subset=latin-ext" rel="stylesheet"> <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.3.1/css/all.css" integrity="sha384-mzrmE5qonljUremFsqc01SB46JvROS7bZs3IO2EmfFsd15uHvIt+Y8vEf7N7fWAU" crossorigin="anonymous"> <!-- ------------ Only edit title, subtitle & author above this ------------ --> ```r knitr::opts_chunk$set(echo = FALSE, fig.align = "center") ``` --- ## What is data visualization? * This might seem a *trivial* question, but what would you answer? * Technical writer Kate Brush gives a great one-line definition: > Data visualization is the practice of translating information into a visual context, such as a map or graph, to make data easier for the human brain to understand and pull insights from <br> .pull-left-50[ * The important point here is that data visualization is for **humans** * The human brain is still better than any computer at identifying patterns or outliers * The advent of genomics resulted in many new types of data and file formats * New data types meant researchers had to come up with new visualization methods ] .pull-right-50[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#data/visualization/xkcd-self-driving.png" alt="xkcd: Self Driving" width="75%" /> <p class="caption">xkcd: Self Driving</p> </div> ] --- ## Why is data visualization important? .pull-left-50[ * Data visualization is useful for a variety of reasons: * Data cleaning * Exploring data structure * Detecting outliers and unusual groups * Identify trends and clusters * Presenting results * By its very nature genomic data is often very **large** and **complex** * There is also a huge variety of different genomic data **types** * How can we *effectively* visualize all of this data? **Genome browsers to the rescue!** ] .pull-right-50[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#data/visualization/genomic-data.png" alt="Common data types" width="100%" /> <p class="caption">Common data types</p> </div> ] --- ## What is a genome browser? * In bioinformatics, a **genome browser** is a graphical interface for displaying genomes * Genome browsers enable researchers to browse genomes alongside **genomic data** * The types of genomic data usually displayed: * Genotyping * Gene expression * Epigenetics * A large number of genome browsers are available, many of them **free** * The best known genome browsers: * Online genome browsers: * [UCSC Genome Browser](https://genome.ucsc.edu) * [Ensembl Genome Browser](https://www.ensembl.org/index.html) * [NCBI Genome Data Viewer](https://www.ncbi.nlm.nih.gov/genome/gdv/) * Desktop genome browsers: * [Integrative Genomics Viewer](https://software.broadinstitute.org/software/igv/) * [Integrated Genome Browser](https://www.bioviz.org) --- ## How do genome browsers work? <br> <img src="data:image/png;base64,#data/visualization/coord-system.png" width="100%" style="display: block; margin: auto;" /> <br> * Genome browsers use a **coordinate system** to display genomic data * Data is displayed on tracks using **glyphs** * Coordinates determine the **position** of glyphs * Chromosome name * Start position * End position * Glyphs can accommodate additional information * Score (e.g., read coverage) * Relationship (e.g., exons within a transcript) * Variation (e.g., SNPs and indels) --- ## Genome coordinate systems <br> <img src="data:image/png;base64,#data/visualization/zero-vs-one.png" width="60%" style="display: block; margin: auto;" /> <br> * Confusingly, there are actually *two* coordinates systems: .pull-left-50[ * 0-based * Reference starts at zero * Numbers *between* nucleotides * Ensembl Genome Browser * File formats: `BED` `BAM` ] .pull-right-50[ * 1-based * Reference starts at one * Numbers nucleotides *directly* * UCSC Genome Browser * File formats: `GFF` `SAM` `VCF` ] --- ## Genomic file formats * When loading a data file, genome browsers use the file extension to determine the file format * The file format sets the data type and glyph display options: * Segmented copy number: `seg` * Sequence alignments: `bed` `cram` * Genome annotations: `bed` `gtf` `gff3` `psl` `bigbed` * Quantitative data: `wig` `bedgraph` `bigwig` `tdf` <br> <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#data/visualization/igv-types.jpg" alt="Common glyph display options" width="70%%" /> <p class="caption">Common glyph display options</p> </div> --- ## Online genome browsers ### UCSC <iframe src="https://genome.ucsc.edu" width="100%" height="500px" data-external="1"></iframe> --- ## Online genome browsers ### Ensembl <iframe src="https://www.ensembl.org/index.html" width="100%" height="500px" data-external="1"></iframe> --- ## Desktop genome browsers ### IGV <iframe src="https://software.broadinstitute.org/software/igv/" width="100%" height="500px" data-external="1"></iframe> --- ## Desktop genome browsers ### IGB <iframe src="https://www.bioviz.org" width="100%" height="500px" data-external="1"></iframe> --- ## Integrative Genomics Viewer ### Overview .pull-left-70[ * Enables intuitive real-time exploration of diverse, large-scale genomic data * Supports flexible integration of a wide range of genomic data types including: * Aligned sequence reads * Mutations * Copy number * RNA interference screens * Gene expression * Methylation * Genomic annotations * Navigation allows the user to zoom and pan seamlessly across the genome * Data can be loaded from local or remote sources, including cloud-based resources ] .pull-right-30[ <img src="data:image/png;base64,#data/visualization/igv-logo.png" width="50%" style="display: block; margin: auto;" /> ] --- ## Integrative Genomics Viewer <br> <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#data/visualization/igv-layout.jpeg" alt="The IGV application window" width="80%%" /> <p class="caption">The IGV application window</p> </div> --- ## Integrative Genomics Viewer <br> <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#data/visualization/igv-coverage.webp" alt="Coverage plot and alignments from paired-end reads for a matched tumor/normal pair" width="90%" /> <p class="caption">Coverage plot and alignments from paired-end reads for a matched tumor/normal pair</p> </div> --- ## Integrative Genomics Viewer <br> <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#data/visualization/igv-alignment.jpeg" alt="Read alignment views at 20 kb and base pair resolution" width="80%" /> <p class="caption">Read alignment views at 20 kb and base pair resolution</p> </div> --- ## Integrative Genomics Viewer <br> <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#data/visualization/igv-attribute.jpeg" alt="The attribute panel displays a color-coded matrix of phenotypic and clinical data" width="52%" /> <p class="caption">The attribute panel displays a color-coded matrix of phenotypic and clinical data</p> </div> --- ## Public data examples ### https://doi.org/10.1016/j.cell.2016.12.016 <br> <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#data/visualization/example-figure-1.png" alt="Snapshot of genomic data at a candidate genomic locus" width="61%" /> <p class="caption">Snapshot of genomic data at a candidate genomic locus</p> </div> --- ## Public data examples ### https://doi.org/10.7554/eLife.22631 <br> <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#data/visualization/example-figure-2.jpg" alt="Genomic snapshots illustrating ChIP-seq signal before and after treatment" width="65%" /> <p class="caption">Genomic snapshots illustrating ChIP-seq signal before and after treatment</p> </div> --- ## Summary * Data visualization is the process of transforming data into a visual context, like a map or graph * Data visualization makes it easier for the human brain to understand and identify patterns * The advent of genomics meant new data types were created which had to be visualized * Genome browsers allow you to display genomic data from multiple technologies * There are lots of genome browsers, some available online and others as desktop applications * Ultimately, the best way to understand genome browsers is to explore them hands-on --- ## Resources * [Data File Formats](https://genome.ucsc.edu/FAQ/FAQformat.html) * [IGV User Guide](https://software.broadinstitute.org/software/igv/UserGuide) * [Ensembl Tutorials](https://www.ensembl.org/info/website/tutorials/index.html) * [UCSC Training](https://genome.ucsc.edu/training/) <!-- --------------------- Do not edit this and below --------------------- --> --- name: end_slide class: end-slide, middle count: false # Thank you. Questions? .end-text[ <p class="smaller"> <span class="small" style="line-height: 1.2;">Graphics from </span><img src="./assets/freepik.jpg" style="max-height:20px; vertical-align:middle;"><br> Created: 22-Oct-2022 • James Ashmore • <a href="https://www.zifornd.com/category/omics-bioinformatics">Bioinformatics</a> • <a href="https://www.zifornd.com">Zifo</a> </p> ]